Goto

Collaborating Authors

 value concept




CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

arXiv.org Artificial Intelligence

The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation: they should align with changing human value definitions with minimal annotation, against their own bias (adaptability), and detect varying value expressions and scenarios robustly (generalizability). To handle these challenges, we introduce CLAVE, a novel framework which integrates two complementary LLMs, a large one to extract high-level value concepts from a few human labels, leveraging its extensive knowledge and generalizability, and a smaller one fine-tuned on such concepts to better align with human value understanding. This dual-model approach enables calibration with any value systems using <100 human-labeled samples per value type. Then we present ValEval, a comprehensive dataset comprising 13k+ (text,value,label) tuples across diverse domains, covering three major value systems. We benchmark the capabilities of 12+ popular LLM evaluators and analyze their strengths and weaknesses. Our findings reveal that combining fine-tuned small models and prompt-based large ones serves as a superior balance in value evaluation.


Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

arXiv.org Artificial Intelligence

Prior research in representation engineering has revealed that LLMs encode concepts within their representation spaces, predominantly centered around English. In this study, we extend this philosophy to a multilingual scenario, delving into multilingual human value concepts in LLMs. Through our comprehensive exploration covering 7 types of human values, 16 languages and 3 LLM series with distinct multilinguality, we empirically substantiate the existence of multilingual human values in LLMs. Further cross-lingual analysis on these concepts discloses 3 traits arising from language resource disparities: cross-lingual inconsistency, distorted linguistic relationships, and unidirectional cross-lingual transfer between high- and low-resource languages, all in terms of human value concepts. Additionally, we validate the feasibility of cross-lingual control over value alignment capabilities of LLMs, leveraging the dominant language as a source language. Drawing from our findings on multilingual value alignment, we prudently provide suggestions on the composition of multilingual data for LLMs pre-training: including a limited number of dominant languages for cross-lingual alignment transfer while avoiding their excessive prevalence, and keeping a balanced distribution of non-dominant languages. We aspire that our findings would contribute to enhancing the safety and utility of multilingual AI.


Modelling Human Values for AI Reasoning

arXiv.org Artificial Intelligence

In academia, a growing body of research investigates the role of human values in designing ethical AI [12, 31, 74, 90]. Indeed, one of our leading AI research luminaries, Stuart Russell, believes the overarching goal of AI should change from "intelligence" to "intelligence provably aligned with human values" [74]. This call to arms gave birth to the value alignment problem. This challenge of engineering values into AI in response to the value alignment problem has resulted in a range of research areas: how human values can be learnt [43, 44, 45, 91]; how individual values can be aggregated to the level of groups [41]; how arguments that explicitly reference values can be made [7]; how decision making can be value-driven [14, 17, 21]; how online institutions can ensure value-aligned behaviours in hybrid communities [56, 57]; and how norms are selected or synthesised to maximise value-alignment [55, 80, 83]. Yet despite these efforts, no formal model of values exists today that provides a concrete foundational platform from which data structures and algorithms can be designed to build AI architectures that address the valuealignment problem. In response, we propose such a model built on the following guiding principles: 1) we employ a formal language to be precise about modelling values and related concepts [23, 47]; 2) we construct the formal components of this model to provide the foundations for the data structures and algorithmic design that will enable value-based reasoning; 3) we design the model to be agnostic on any specific implementation of values, though we do provide example implementation scenarios to illustrate the model's ubiquity and practical applicability; 4) we set out the model to subsume and relate to established concepts in AI research as much as possible; 5) we provide illustrative examples of building data structures and algorithms enabling value-based reasoning taken from our ongoing research applied to real-world use cases; 6) we ensure the model draws upon the wealth of work from within social psychology and explicitly demonstrate the grounding of our model within this research; and


Exploring Values in Museum Artifacts in the SPICE project: a Preliminary Study

arXiv.org Artificial Intelligence

This document describes the rationale, the implementation and a preliminary evaluation of a semantic reasoning tool developed in the EU H2020 SPICE project to enhance the diversity of perspectives experienced by museum visitors. The tool, called DEGARI 2.0 for values, relies on the commonsense reasoning framework TCL, and exploits an ontological model formalizingthe Haidt's theory of moral values to associate museum items with combined values and emotions. Within a museum exhibition, this tool can suggest cultural items that are associated not only with the values of already experienced or preferred objects, but also with novel items with different value stances, opening the visit experience to more inclusive interpretations of cultural content. The system has been preliminarily tested, in the context of the SPICE project, on the collection of the Hecht Museum of Haifa.


Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations

arXiv.org Artificial Intelligence

Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multi-modal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability.